Specification Mining with Few False Positives
نویسندگان
چکیده
Formal specifications can help with program testing, optimization, refactoring, documentation, and, most importantly, debugging and repair. Unfortunately, formal specifications are difficult to write manually and techniques that infer specifications automatically suffer from 90–99% false positive rates. Consequently, neither option is currently practiced for most software development projects. We present a novel technique that automatically infers partial correctness specifications with a very low false positive rate. We claim that existing specification miners yield false positives because they assign equal weight to all aspects of program behavior. For example, we grant less credence to duplicate code, infrequently-tested code, and code that has been changed often or recently. By using additional information from the software engineering process, we are able to dramatically reduce this rate. We evaluate our technique in two ways: as a preprocessing step for an existing specification miner and as part of a novel specification inference algorithm. Our technique identifies which traces are most indicative of program behavior, which allows off-the-shelf mining techniques to learn the same number of specifications using 60% of their original input. This results in many fewer false positives as compared to state of the art techniques, while still finding useful specifications on over 800,000 lines of
منابع مشابه
Controlling False Positives in Association Rule Mining
Association rule mining is an important problem in the data mining area. It enumerates and tests a large number of rules on a dataset and outputs rules that satisfy user-specified constraints. Due to the large number of rules being tested, rules that do not represent real systematic effect in the data can satisfy the given constraints purely by random chance. Hence association rule mining often...
متن کاملReducing False Positives in the Construction of Adjective Scales
Many adjectives that appear to be synonyms of one another differ in their intensity. Distinguishing the nuances between adjective synonyms is vital to linguistic understanding of a language, but WordNet currently does not encode the relative intensities of adjective synonyms that lie on the scale. Sheinman & Tokunaga (2009) proposed a solution of constructing Adjective Scales by data mining a w...
متن کاملData mining and machine learning - Towards reducing false positives in intrusion detection
Intrusion Detection Systems (IDSs) are used to monitor computer systems for signs of security violations. Having detected such signs, IDSs trigger alerts to report them. These alerts are presented to a human analyst, who evaluates them and initiates an adequate response. In practice, IDSs have been observed to trigger thousands of alerts per day, most of which are mistakenly triggered by benign...
متن کاملAn Outlier Detection-Based Alert Reduction Model
Intrusion Detection Systems (IDSs) are widely deployed with increasing of unauthorized activities and attacks. However they often overload security managers by triggering thousands of alerts per day. And up to 99% of these alerts are false positives (i.e. alerts that are triggered incorrectly by benign events). This makes it extremely difficult for managers to correctly analyze security state a...
متن کاملAttribute Weighting with Adaptive NBTree for Reducing False Positives in Intrusion Detection
In this paper, we introduce new learning algorithms for reducing false positives in intrusion detection. It is based on decision tree-based attribute weighting with adaptive naïve Bayesian tree, which not only reduce the false positives (FP) at acceptable level, but also scale up the detection rates (DR) for different types of network intrusions. Due to the tremendous growth of network-based se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009